# A tibble: 2 × 1
`PCOS dimensions`
<int>
1 541
2 44
Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages
Documented symptoms are often; period pains, irregular periods, ovary related problems and hormone imbalance
Patients with PCOS often have problems with pregnancy and potential complication with/in pregnancy
However, it is still not verified what the cause of PCOS is.
The aim of this study is to examine a data set (found on Kaggle) of patients with and without PCOS. The data set has been made in India and data comes from 10 different hospitals.
Raw data:
541 observations divided into 45 variables
01_load_data:
Simply loads the data
02_clean_data:
03_augment:
# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
mutate(BMI = round(BMI, 1)) |>
mutate(BMI_class = case_when(
BMI < 18.5 ~ "Underweight",
BMI <= 18.5 | BMI < 25 ~ "Normal weight",
BMI <= 25 | BMI < 30 ~ "Overweight",
BMI >= 30 ~ "Obesity")) |>
mutate(BMI_class = factor(BMI_class,
levels = c("Underweight",
"Normal weight",
"Overweight",
"Obesity"))) |>
relocate(BMI_class, .after = BMI)Dimensions:
# A tibble: 2 × 1
`PCOS dimensions`
<int>
1 541
2 44
Count of how many have PCOS:
# A tibble: 2 × 2
PCOS_diagnosis n
<chr> <int>
1 No 364
2 Yes 177
Follicle number and PCOS diagnosis:
Slight divergence of PCOS and non-PCOS in body measurements.
No diverging of PCOS diagnosed individuals compared to non-PCOS diagnosed individuals.
Distribution between women diagnosed with and without PCOS
PCA plots show few variables show relevance to diagnose PCOS (FSH & LH)
Body parameters show little clustering (BMI)
Not an optimal data set for significant conclutions